Back

Microbial Genomics

Microbiology Society

Preprints posted in the last 30 days, ranked by how well they match Microbial Genomics's content profile, based on 204 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.

1
Assessment of Oxford Nanopore whole genome sequencing for large-scale genomic characterisation of Staphylococcus aureus

Haugan, I.; Flatby, H. M.; Lysvand, H.; Skei, N. V.; Zaragkoulias, K.; Solligard, E.; Ronning, T. G.; Olsen, L. C.; Damas, J. K.; Afset, J. E.; As, C. G.

2026-04-01 genomics 10.64898/2026.03.30.715209 medRxiv
Top 0.1%
40.7%
Show abstract

Whole-genome sequencing (WGS) is increasingly being utilised in microbial diagnostics, surveillance, and research. In this paper we assess the performance of one leading long-read sequencing technology, Oxford Nanopore Technology (ONT), on 836 Staphylococcus aureus bacteraemia isolates. We compare the results to that of a leading short-read sequencing technology, Illumina. All isolates were sequenced using ONT MinION Mk1B and Illumina HiSeq or MiSeq. Libraries were prepared according to manufacturers instructions. Preprocessing and downstream bioinformatic analyses were performed using a combination of in-house pipelines and publicly available software tools. The average base substitution error rate in ONT assemblies was low but varied between sequence types, possibly due to lineage-specific methylation patterns. Multi locus sequence typing was similar between the technologies, while ONT assemblies allowed for better spa typing than Illumina assemblies. The reported detection rate was similar between ONT and Illumina assemblies for most virulence- and AMR-associated genes and variants. For 42 (22.2%) of 189 genes/variants, the two technologies disagreed in gene detection in 5 isolates or more, and in 39 (20.6.%) of these the highest detection rate was found with ONT. Discrepancies were mainly associated with low GC content, multiple repetitive segments, and small plasmids. Polishing of ONT data resulted in minor changes in gene/variant calling. Our study supports the use of ONT WGS for bacterial population genomic studies on a large collection of S. aureus isolates. While assembly of ONT reads may be affected by its own methodological limitations, it was superior to Illumina assemblies in detection of potentially clinically relevant genes and variants at a low read error rate. Understanding the advantages and limitations of WGS technologies is essential before undertaking studies involving such methods on large sets of bacteria. Author summaryIn this paper, we present a practical assessment of one important whole genome sequencing (WGS) method, Oxford Nanopore Technology (ONT), and compare its performance in bacterial population genomics to that of WGS with Illumina technology. Our goal was to investigate the usefulness of ONT in studies aiming to identify clinically relevant bacterial characteristics in large collections of bacteria, such as genotype-phenotype studies. We sequenced a large set of clinical S. aureus isolates from episodes of bloodstream infections using both ONT and Illumina technologies and performed analyses with widely used software and bioinformatic pipelines. We have elucidated inherent strengths and limitations of ONT and Illumina sequencing and report some of the practical consequences of these on bacterial typing and detection of clinically relevant genes. With this study, we present one of the most comprehensive assessments of long-read sequencing technology for the genomic characterisation of clinical bacterial isolates, and the findings provide guidance for researchers considering WGS in large-scale bacterial genomics.

2
Salmonella Genomic Markers for Risk to Food Safety

Waters, E. V.; Hill, C.; Orzechowska, B.; Cook, R.; Jorgensen, F.; Chattaway, M. A.; Langridge, G. C.

2026-03-30 genomics 10.64898/2026.03.27.714810 medRxiv
Top 0.1%
33.6%
Show abstract

Foodborne non-typhoidal Salmonella remains a major public health concern, yet routine surveillance recovers large numbers of isolates from food that are not associated with human illness. Studies have shown foodborne isolates can be genetically linked to clinical cases, highlighting a critical challenge for risk assessment and outbreak prioritisation. This study aimed to determine whether genomic markers can distinguish foodborne Salmonella strains with an increased likelihood of causing infection. Whole-genome sequencing data from over 900 Salmonella isolates recovered from food and the environment through UK Health Security Agency surveillance were analysed using hierarchical clustering to define genetically related groups. These clusters were expanded using the global EnteroBase database to provide broader epidemiological context. Genome-wide association analyses identified genetic markers associated with clusters containing clinical isolates, including phage-associated regions. A highly conserved 7 kb marker identified in S. Agona demonstrated strong predictive performance at a global scale, with high sensitivity and specificity for infection-associated lineages and strict serovar restriction. Comparative genomic analysis revealed that all markers localised to a shared chromosomal hotspot corresponding to a prophage integration site. The 7 kb risk-associated marker formed part of a larger prophage closely related to the well-characterised S. Typhimurium Fels-2 phage, which encodes a DNA invertase linked to phase variation, a mechanism known to promote phenotypic heterogeneity and host adaptation. As these S. Agona isolates are monophasic, our findings indicate that our genome-wide association approach has rediscovered this DNA invertase known to contribute to infection risk but in a different serovar via an alternative regulatory mechanism. Overall, this work demonstrates the potential to move beyond treating all foodborne Salmonella isolates as equivalent hazards, towards a genomics-informed framework for risk stratification. This approach provides a foundation for improved risk-based decision-making, enhance outbreak investigations and enable earlier prioritisation of public health responses during Salmonella surveillance and control. Author summaryFoodborne Salmonella infections remain a major public health concern, but not all strains pose the same risk to human health. Here we investigated whether genetic differences could explain why some foodborne strains are more likely to cause human infection. We analysed over 900 genomes from food and environmental sources, grouping closely related strains before placing them in a global context using EnteroBase. By combining pangenome and genome-wide association analyses, we identified distinct lineages within several serovars that differed in their association with human cases. In Salmonella Agona, all clinical isolates belonged to a single lineage carrying a highly conserved 7 kb marker that was absent from low-risk strains. This marker demonstrated strong sensitivity and specificity across global datasets and was located within a prophage closely related to the well-characterised Fels-2 phage. This region encodes a DNA invertase previously linked to phase variation, a mechanism that promotes bacterial adaptability. Our findings indicate that infection risk can be structured at the lineage level and influenced by mobile genomic elements, particularly prophages, that enhance environmental persistence and host adaptation. This work advances genomic surveillance from retrospective linkage towards mechanistic and predictive risk assessment, with direct relevance for supporting risk-based decision-making during outbreak investigations.

3
Mobile element-mediated carbapenem resistance in Enterobacter hormaechei in a Nigerian intensive care unit

Mba, I. E.; Odih, E. E.; Adekanmbi, O.; Oaikhena, A. O.; Sunmonu, G. T.; Adebiyi, I.; Gbaja, A. T.; Animashaun, O.; Osadebamwen, P.; Idowu, O.; Aanensen, D. M.; Okeke, I. N.

2026-04-10 microbiology 10.64898/2026.04.09.712135 medRxiv
Top 0.1%
32.7%
Show abstract

Carbapenem-resistant Gram-negative bacteria pose a critical public health threat. The role of mobile genetic elements in driving their transmission and persistence remains poorly defined. In 2022, we investigated a suspected outbreak of carbapenem-resistant Acinetobacter baumannii (CRAB) in a Nigerian adult intensive care unit (ICU), using short-read whole genome sequencing (WGS) of carbapenem-resistant clinical and environmental isolates during the cluster period. Mobile element dynamics were then inferred from hybrid assemblies of Illumina and Oxford Nanopore reads. The suspected CRAB outbreak was ruled out by WGS but a carbapenem-resistant Enterobacter hormaechei ST114 bloodstream isolate was found to be indistinguishable from two environmental isolates, all recovered during the Acinetobacter surge. Hybrid assemblies revealed a strikingly conserved [~]19 Kb resistance island shared across all ST114 genomes. The island contained a blaNDM-5 cassette alongside many other antimicrobial resistance genes, within class 1 integronns and flanked by insertions sequences, located on a 46,176 bp plasmid. Using the ST114 plasmids hybrid assembly as scaffold, the same plasmid was identified in the genome of a Klebsiella pneumoniae ST15 isolate from the ICU environment during the same period. Additionally, re-interrogation of genomic surveillance data uncovered four clonal 2020 ST109 Enterobacter bloodstream isolates from the same facility that carried the resistance genes in the same context on a large 267,242 bp plasmid. Carbapenem resistance in hospital Enterobacterales is driven by both clonal expansion and horizontal spread of mobile resistance elements. These findings underscore the need to track mobile elements alongside bacterial lineages to inform evidence-based infection control, especially in low-resource settings. Impact StatementCarbapenem resistance among Enterobacterales remains a major public health threat, yet how mobile genetic elements contribute to their persistence and spread in hospital settings is still poorly understood. In this study, we investigated a suspected outbreak of carbapenem-resistant Acinetobacter baumannii in an adult intensive care unit in Nigeria. Although the outbreak was eventually ruled out, genomic analysis has shown the importance of careful interpretation of suspected outbreak cases in hospital settings. Our findings highlight the importance of close monitoring of ICU environments, the implementation of blood culture-based diagnostics, and the value of genomic support in outbreak investigations. These findings demonstrate that carbapenem resistance in hospital Enterobacterales is driven not only by clonal expansion but also by the horizontal dissemination of a highly stable blaNDM-5-associated MDR island capable of integrating into diverse plasmid backbones. This study emphasizes the need for genomic surveillance that tracks both mobile elements and bacterial lineages to strengthen outbreak investigations, especially in low-resource settings. It further underscores the links between clinical and environmental AMR reservoirs and reinforces the value of a One Health approach to controlling carbapenem resistance. Data summaryFASTQ sequences were deposited in the NCBI BioSample database under accession numbers SAMN55915584 - SAMN55915597.

4
Refining the Serine Protease Autotransporters of Enterobacteriaceae (SPATE) gene detection in Enteroaggregative Escherichia coli genomes uncovers differential SPATE distribution by phylogeny

Dada, R. A.; Afolayan, A. O.; Adewuyi, O. A.; Tytler, B. A.; Olayinka, B. O.; Thomson, N. R.; Okeke, I. N.

2026-04-16 microbiology 10.64898/2026.04.16.715897 medRxiv
Top 0.1%
22.1%
Show abstract

BackgroundEnteroaggregative Escherichia coli (EAEC) are a heterogenous pathotype, implicated in acute and persistent diarrhoea especially in developing countries. Serine Protease Autotransporters of Enterobacteriaceae (SPATEs) are Type V Secretory System trypsin-like proteases repeatedly reported from EAEC. This study aimed to determine SPATE encoding-gene prevalence among EAEC and their association with diarrhoea. We screened 881 EAEC genomes from four recent epidemiological studies in Nigeria for 23 SPATE-encoding genes, initially using ARIBA and the Virulencefinder database. ResultsInitial screening inflated SPATE gene content, particularly in genomes with multiple SPATEs, due to cross detection of highly similar sequences and other artefacts. We developed and validated refined methodology, which detected 478 of 1,156 original SPATE calls and also identified SPATE miscalls from previous datasets in the literature. The most prevalent SPATE-encoding gene in our EAEC collection was sepA 297(33.71%), closely followed by sat 360 (29.74%). pic, encoding a SPATE with mucinase activity, was found in 65 (7.4%) genomes and associated with diarrhoea (p=0.00004). EAEC strains belonging to E. coli phylogroups A, B1 or C carried, on average, one SPATE gene per genome while >1 was typically detected in phylogroup B2 EAEC. Other EAEC carried few or no SPATE genes. ConclusionsOur study shows that multifunctional genome analysis tools may have to be refined for certain gene families to avoid overestimation. SPATEs are not as prevalent as previously thought but they remain common among EAEC, particularly among phylogroup A, B1, B2 and C, pointing to the possibility that they make lineage-specific contributions to disease.

5
The pQBR mercury resistance plasmids: a model set of sympatric environmental mobile genetic elements

Orr, V. T.; Harrison, E.; Rivett, D. W.; Wright, R. C. T.; Hall, J. P. J.

2026-03-27 microbiology 10.64898/2026.03.27.714766 medRxiv
Top 0.1%
21.9%
Show abstract

Plasmids are extrachromosomal mobile genetic elements that can facilitate rapid bacterial adaptation by transferring genes between individuals. While plasmids are known to exist in diverse habitats and encode a range of traits, most of our knowledge about plasmids comes from clinically-associated antimicrobial resistance (AMR) plasmids that have already been recruited as vectors of drug resistance and have likely been shaped by strong selection for plasmid-encoded resistance. Here, we investigated 26 plasmids from the pQBR collection -- a set of large, co-existing mercury resistance environmental plasmids isolated in Pseudomonas spp. from a field in Oxfordshire in the 1990s -- and explored the ability of pQBR plasmids to mobilise novel chromosomally-encoded traits. New whole genome sequences for 25 plasmids confirmed that these soil-isolated plasmids are generally very large (140-588 kb), constitute at least five distinct genetic groups, and have relatives in various other Pseudomonas species and habitats. Despite significant nucleotide-level divergence, Groups I (pQBR103-like, [~]406 kb) and IV (pQBR57-like, [~]328 kb) showed remarkable ancient similarities in synteny and gene content both with one other, and with the PInc-2 family of plasmids known to mobilise clinically significant drug resistance in Pseudomonas aeruginosa. None of the pQBR plasmids sequenced to date harboured known AMR determinants, but putative phage defence systems and metal resistances were evident. Transposable elements, including the Tn5042 mercury resistance transposon, were responsible for significant structural variation within plasmid groups, consistent with a predominant role of transposons in rapidly remodelling plasmids. To experimentally test the ability of pQBR plasmids to spread new traits, we developed a novel transposon mobilisation assay which showed that certain Group IV pQBR plasmids were especially effective at acquiring the chromosomally-encoded transposon Tn6291, and that this mobilisation was likely due to specific plasmid factors rather than generic conjugation rate. Our work presents a tractable set of sequenced plasmids suitable for exploring the evolution and dynamics of gene acquisition by pre-AMR plasmids, and provides a key case study highlighting the pervasive interplay between plasmids and transposable elements that can drive microbial genome evolution. Repositories: github.com/jpjh/PQBR_PLASMIDS Impact statementPlasmids can drive microbial evolution by acting as vectors for horizontal gene transfer. Because of their central role in disseminating antimicrobial resistance (AMR), plasmids are mainly explored as vehicles for AMR traits, meaning that our knowledge of the diversity and evolutionary dynamics of non-AMR plasmids is more limited. Here, we explore sequences from a set of mercury resistance plasmids isolated in Pseudomonas spp. from pristine agricultural land that lack AMR determinants. By providing new whole genome sequencing analyses we expand the set of sequenced pQBR plasmids to 26, finding globally dispersed relatives from clinical, environmental, and industrial settings, and identifying an ancient plasmid backbone shared amongst divergent modern environmental and clinical AMR plasmids. We experimentally verify the role of pQBR plasmids in readily mobilising chromosomal traits using a novel transposon mobilisation assay, which suggests that specific plasmid-transposon interactions may drive trait spread. Overall, our work expands our understanding of the role of environmental plasmids in mobilising and disseminating adaptive traits.

6
Genomic characterization of Escherichia coli and Enterobacter hormaechei clinical isolates from a tertiary healthcare facility in Kenya

Musundi, S.; Kimani, R. W.; Waweru, H. K.; Wakaba, P.; Mbogo, D.; Essuman, S.; Onyambu, F.; Kanoi, B. N.; Gitaka, J.

2026-04-15 bioinformatics 10.64898/2026.04.13.718279 medRxiv
Top 0.1%
19.3%
Show abstract

Extended-spectrum beta-lactamase-producing Enterobacterales such as Escherichia coli and Enterobacter hormaechei represent a growing public health challenge in clinical settings, particularly in low-and middle-income countries, due to the escalating threat of antimicrobial resistance (AMR). In this study, we aimed to identify the antibiotic resistance genes present in E. coli (n=4) and E. hormaechei (n=3) clinical isolates. Multidrug-resistant phenotypes were confirmed using disc diffusion assays against 20 antibiotics. Whole-genome sequencing of resistant isolates was performed using Oxford Nanopore Technologies. Genome assembly and analysis revealed high-risk clones, including sequence type (ST) 1193 in E. coli and ST78 in E. hormaechei. All E. coli isolates harbored the blaCTX-M gene in their chromosomes along with point mutations conferring resistance to fluoroquinolones, while E. hormaechei isolates encoded blaACT in their chromosomes. Additionally, both species carried plasmids with multiple antibiotic resistance genes, including blaOXA and blaTEM, co-located with metal resistance operons, indicating the potential for horizontal gene transfer. BLAST analysis revealed high sequence similarity between the plasmids identified in clinical isolates and those previously recovered from environmental sources, highlighting the role of environmental reservoirs in AMR dissemination. Notably, no carbapenem resistance genes were detected in any isolate. These findings underscore the growing threat posed by multidrug-resistant Enterobacterales in clinical settings and emphasize the urgent need for strengthened infection prevention and control measures to mitigate AMR spread.

7
Genomic epidemiology of the 2017-2023 outbreak of Mycoplasma bovis sequence type ST21 in New Zealand

French, N. P.; Burroughs, A.; Binney, B.; Bloomfield, S.; Firestone, S. M.; Foxwell, J.; Gias, E.; Sawford, K.; van Andel, M.; Welch, D.; Biggs, P. J.

2026-04-10 genomics 10.64898/2026.04.07.717125 medRxiv
Top 0.1%
19.0%
Show abstract

Mycoplasma bovis was first detected in cattle in New Zealand in 2017, prompting an eradication programme that incorporated extensive surveillance and a test-and-cull policy. Genome sequence data and phylodynamic models were used to inform decision making throughout the eradication programme. Isolates from 697 cattle on 126 farms were collected and sequenced between July 2017 and December 2023. Phylodynamic models were used to estimate the time of most recent common ancestor, the effective reproduction number (Reff) and effective population size, and long-range and local between-farm transmission dynamics. The analysis revealed the dramatic impact of movement restrictions and culling up to early 2020, with a sharp reduction in the Reff to less than 1 in 2018/9 and the extinction of two of three major lineages in 2020. This was followed by three-years of residual infection in farms in the South Island, associated with persistent infection of a large feedlot farm and nearby farms. The comprehensive dataset of genomic and epidemiological data provided a unique opportunity to study the dynamics of a country-wide outbreak of a single-host pathogen from first detection to potential eradication, underlining the utility of integrated genomic surveillance during an outbreak response. Author summaryThe economically important cattle pathogen, Mycoplasma bovis, was first detected in New Zealand in 2017. This led to a large-scale, successful control programme aimed at eradication of the pathogen. The decision to undertake an eradication programme was informed by initial analyses of whole genome sequences from isolates collected as part of the surveillance programme. The analysis showed that the bacteria had entered New Zealand relatively recently and was unlikely to be widespread. Over the subsequent years, genome sequencing and modelling of transmission dynamics informed important policy decisions made by the New Zealand Government and the cattle industry, and helped to monitor progress of the eradication programme. The impact of the detection, movement control and culling programme was profound, with sharp reductions in transmission between 2018 and 2020. This was followed by a long tail of localised infection in the South Island, involving transmission from a large feedlot farm. Provisional eradication was achieved after depopulation of this feedlot. This analysis highlights the role of genomic surveillance and modelling to inform decision making during an infectious disease outbreak.

8
Temporal dynamics and acquisition of Shiga toxin subtype stx2a within Shiga toxin-producing Escherichia coli in England, 2016 to 2024

Hayles, E. H.; Rodwell, E. V.; Greig, D. R.; Jenkins, C.; Langridge, G. C.

2026-04-12 genetics 10.64898/2026.04.09.717390 medRxiv
Top 0.1%
18.9%
Show abstract

Shiga toxin-producing Escherichia coli (STEC) are an important public health concern due to their association with foodborne gastroenteritis and severe outcomes including haemolytic uraemic syndrome (HUS), particularly linked to the stx2a subtype of the Shiga toxin. We investigated the temporal dynamics and acquisition of stx2a among STEC isolates submitted to the United Kingdom Health Security Agency (UKHSA) between 2016 and 2024. 12,888 whole genome STEC sequences and associated metadata were analysed. 31.9% of STEC isolates harboured stx2a, spanning 78 O serogroups with a marked shift from STEC O157 to non-O157 serogroups over time. STEC O26:H11 and STEC O145:H28 were the primary drivers of observed increases, most commonly associated with stx2a alone or in combination with stx1a. The widespread and increasing presence of stx2a across the STEC population in England highlights an emerging public health risk and demonstrates the value of routine genomic surveillance in monitoring high-severity Shiga toxin subtypes.

9
SCCmecExtractor: A tool for extracting Staphylococcal Cassette Chromosome elements from Whole Genome Sequences

MacFadyen, A. C.

2026-03-31 microbiology 10.64898/2026.03.31.715619 medRxiv
Top 0.1%
18.7%
Show abstract

Staphylococcal cassette chromosome (SCC) elements are mobile genetic elements that integrate at the rlmH gene and are predominantly responsible for methicillin resistance in staphylococci. Although SCCmec typing tools exist, none can extract the element sequence itself or explicitly classify SCC elements that lack methicillin resistance genes. Here we present SCCmecExtractor, a lightweight Python toolkit that identifies SCC element boundaries through degenerate attachment site (att) pattern matching, extracts complete elements from whole-genome assemblies and characterises their mec and ccr gene content. Benchmarking on 7,297 genomes spanning 70 species across Staphylococcus and Mammaliicoccus demonstrated 100% typing concordance with the sccmec tool1 on 1,454 S. aureus genomes. The tool extracted 1,562 SCC elements, from 1,454 S. aureus, 5,295 non-aureus Staphylococcus and 548 Mammaliicoccus genomes, achieving effective extraction rates (excluding assembly-limited genomes and those lacking valid ccr pairs) of 87.3% for S. aureus, 58.8% for non-aureus Staphylococcus, and 61.9% for Mammaliicoccus. Notably, 616 of the 1,562 extracted elements (39.4%) were non-mec SCC elements lacking methicillin resistance genes, a class of mobile element often overlooked. Non-mec SCC prevalence increased from 12.2% in S. aureus to 55.6% in non-aureus Staphylococcus and 76.0% in Mammaliicoccus, revealing a substantial reservoir of SCC diversity beyond methicillin resistance. SCCmecExtractor is freely available via PyPI, Docker and Singularity under an MIT licence. Impact StatementStaphylococcal cassette chromosome (SCC) elements are mobile genetic elements responsible for methicillin resistance in staphylococci and are central to methicillin resistant Staphylococcus aureus (MRSA) epidemiology. Existing tools focus on typing SCCmec from assemblies but cannot extract the element itself, limiting our ability to comprehensively monitor and examine these elements. SCCmecExtractor is a lightweight, portable tool that detects the attachment sites, required by SCC elements to integrate into the genome, extracts the SCC element, both mec gene carrying and not, and characterises their gene content. Applied across 7,297 genomes spanning two genera, we demonstrate that non-mec SCC elements are the dominant SCC class outside S. aureus, a finding enabled by systematic extraction and classification of SCC elements regardless of mec gene content. SCCmecExtractor provides the research community with an accessible, confidence-first approach (based on biology) to SCC element analysis across all staphylococci and mammaliicocci species. Data SummaryThe code for this pipeline is available at: https://github.com/AlisonMacFadyen/SCCmecExtractor, with a Docker image available at: https://hub.docker.com/repository/docker/alisonmacfadyen/sccmecextractor and PyPi package at: https://pypi.org/project/sccmecextractor/. All reference databases are bundled with the tool. Benchmarking genome accessions: 1,454 S. aureus, 5,295 non-aureus Staphylococcus, and 548 Mammaliicoccus genomes from NCBI. A complete list of genome accessions is provided as supplementary data (Supplementary Table S1). Extracted SCC elements can be obtained from Zenodo: 10.5281/zenodo.19355206

10
Flex-It: A global standardised genotyping framework for Shigella flexneri

Hawkey, J.; Nodari, C. S.; Iqbal, Z.; Hunt, M.; Wick, R. R.; Chong, C. E.; Jenkins, C.; Howden, B. P.; Holt, K.; Weill, F.-X.; Baker, K. S.; Ingle, D. J.

2026-04-20 microbiology 10.64898/2026.04.17.719127 medRxiv
Top 0.1%
18.0%
Show abstract

Shigella flexneri is the leading causative agent of shigellosis globally. The public health threat posed by S. flexneri is compounded by its emergence as a sexually transmissible infection, importance of international travel in driving dissemination, and the increasing prevalence of antimicrobial resistance (AMR). A rapid and robust computational method is needed to enhance genomic surveillance and systematically explore features of the population structure of this WHO priority pathogen, which is scalable and readily implementable across jurisdictions, particularly as vaccine development efforts are underway. Here, we present Flex-It, a genomic framework and genotyping scheme implemented in Mykrobe for S. flexneri serotypes 1-5, X & Y, compatible with previous approaches used to describe S. flexneris population structure. To develop Flex-It, we curated a retrospective dataset of 5,819 publicly available S. flexneri genomes. We characterised the global population structure for S. flexneri, exploring geographical and temporal traits, and showed the granular diversity of AMR and serotype profiles. We applied Flex-It to >13,000 genomes routinely generated by public health laboratories from Australia, the UK and the USA across a ten-year period. We found significant genotype diversity in all three locations, with the emergence of genotypes with converged resistance to all major drugs currently used for treatment. Flex-It provides an open-source, novel genotyping method that rapidly characterises S. flexneri and its ciprofloxacin resistance determinants in <1 minute from both short and long whole-genome sequencing reads. Flex-It provides the community with a standardised nomenclature to monitor the emergence and spread of S. flexneri lineages.

11
Decoding resistance: interpretable machine learning to predict ciprofloxacin resistance in Shigella spp

Gohari, M. R.; Zhang, P.; Villegas, A.; Rosella, L. C.; Patel, S. N.; Hopkins, J. P.; Duvvuri, V. R.

2026-04-11 infectious diseases 10.64898/2026.04.07.26350353 medRxiv
Top 0.1%
14.8%
Show abstract

Antimicrobial resistance (AMR) is a growing global public health threat that complicates the treatment and control of bacterial infections. Shigella spp., a leading cause of bacterial diarrhea worldwide, has increasingly exhibited resistance to multiple antimicrobial agents that are commonly recommended therapy for severe shigellosis. Although conventional antimicrobial susceptibility testing (AST) remains the reference standard, it is time-consuming and provides limited insight into the genetic mechanisms underlying resistance. Whole-genome sequencing (WGS) has emerged as a complementary approach for AMR detection by enabling direct identification of resistance genetic determinants encoded in bacterial genomes. Machine learning (ML) methods applied to genomic features such as k-mers have shown promise for predicting resistance phenotypes from WGS data; however, applications to Shigella remain limited. In this study, we developed and evaluated an interpretable ML framework for predicting ciprofloxacin resistance using k-mer features derived from WGS data of 1,424 Shigella isolates collected in Ontario, Canada, between 2018 and 2025. K-mers were extracted from known gene targets associated with ciprofloxacin resistance, including chromosomal quinoline resistance-determining regions (QRDRs: gyrA and parC) and plasmid-mediated determinants (qnr). Supervised ML approaches were trained and compared. We evaluated the influence of k-mer lengths (k=11, 15, 21 and 31) on predictive performance and model interpretability; and compared models based on chromosomal determinants alone and models incorporating both chromosomal and plasmid-mediated determinants. Randon Forest classifier achieved the most consistent performance across models. Inclusion of plasmid-mediated determinants improved predictive accuracy relative to chromosomal-only models. Although differences across k-mer lengths were modest, k = 11 produced the highest area under the receiver operating characteristic curve (AUC) and the lowest Brier score. SHAP analyses localized high-impact features within QRDRs of gyrA and parC, supporting biological interpretability. These findings demonstrate that biologically-informed k-mer-based ML models can accurately and transparently predict ciprofloxacin resistance in Shigella, supporting their potential integration into genomic AMR surveillance and digital public health frameworks. Author summaryIn this study, we used genome sequencing data to develop machine learning models that predict ciprofloxacin resistance for Shigella directly from bacterial DNA. We focused on small DNA fragments (k-mers) derived from known resistance genes and mutations. Among the approaches tested, a Random Forest model showed the most consistent performance. Combining chromosomal mutations with plasmid-mediated resistance genes improved prediction accuracy and helped identify key genetic regions associated with resistance. These findings demonstrate that machine learning applied to genomic data can accurately and interpretable predict antibiotic resistance, supporting its potential use in genomic surveillance and public health monitoring.

12
Genomic Surveillance of Third-Generation Cephalosporin-Resistant Klebsiella pneumoniae in Tunisian AMR Surveillance System Hospitals

Itani, D.; Smaoui, H.; Thabet, L.; Zribi, M.; Dhraief, S.; Kanzari, L.; Meftah, K.; Achour, W.; Baker, D. J.; Moss, C.-J.; Philips, L. T.; Foster-Nyarko, E.; Boutiba-Ben Boubaker, I.; Holt, K. E.

2026-04-10 infectious diseases 10.64898/2026.04.08.26350452 medRxiv
Top 0.1%
14.5%
Show abstract

Third-generation cephalosporin (3GC)-resistant Klebsiella pneumoniae are an increasing public health threat in Tunisia, yet there is limited data on the circulating lineages and antimicrobial resistance (AMR) determinants underlying this threat. Here, we employed whole-genome sequencing (WGS) in the Tunisian AMR surveillance system (TARSS) to characterize the 3GC resistance mechanisms, population structure, virulence, and transmission across three participating sentinel hospitals in Tunis and Ben Arous. We sequenced a balanced sample of stored 3GC-resistant (3GCR) isolates from blood and urine collected between 2018 and 2022. Of 322 sequenced isolates, 286 (89%) were confirmed as K. pneumoniae, representing 28.5% of all stored 3GC-resistant isolates. The population structure was diverse (68 sublineages) and distinct between hospitals, although several globally distributed sublineages were detected across sites (SL383, SL101, SL307, SL15). Extended-spectrum {beta}-lactamases (ESBL) genes were detected in 77% of genomes, with blaCTX-M-15 (65.4%) and blaCTX-M-14 (8%) dominant at all sites and across diverse sublineages. AmpC genes occurred in 9%, and carbapenemase in 19.6% (blaOXA-48, 14.7%; blaNDM-5, 4.5%; blaNDM-1, 3.8%), with carbapenemases mainly observed amongst SL147 and SL383 at Hospital B (41.7%). Despite sequencing less than a third of the unique 3GCR infections in each hospital, we identified 24 probable nosocomial transmission clusters involving 64 isolates. Each cluster was restricted to a single hospital, although many were detected across multiple wards in the same hospital. The acquired virulence-associated locus (ICEKp) encoding yersiniabactin was common (48.6%). Hypervirulence-associated markers (encoding aerobactin, salmochelin, and/or hypermucoidy) were rare (8.7%) but increasing over time. These were mostly found in sublineages in which convergence of ESBL and hypervirulence has been reported in other settings (including SL147, SL101 and SL383), suggesting international dissemination of convergent strains. These findings show sustained ward-level nosocomial transmission of 3GCR K. pneumoniae lineages and site-specific differences in ESBL and carbapenemase burdens, which call for targeted infection prevention and control and for future routine integration of WGS into TARSS.

13
Culture-independent identification and serotyping of Streptococcus pneumoniae by targeted metagenomics in pleural fluid samples

Smith, S. A. M.; Rockett, R. J.; Oftadeh, S.; Tam, K. K.-G.; Payne, M.; Golubchik, T.; Sintchenko, V.

2026-04-16 epidemiology 10.64898/2026.04.13.26350812 medRxiv
Top 0.1%
14.3%
Show abstract

Streptococcus pneumoniae is the leading cause of empyema and pneumonia in children, and monitoring of effectiveness of polyvalent pneumococcal vaccines has been essential for controlling invasive pneumococcal disease (IPD) in children and elderly adults. Conventional serotyping of pneumococci has relied on Quellung reaction following laboratory culture, however more recently whole genome sequencing (WGS) has been implemented in many reference laboratories to enhance traditional typing. Pleural fluid samples from cases with empyema are often culture negative, limiting the utility of WGS and requiring polymerase chain reaction (PCR) or 16S rRNA sequencing to detect S. pneumoniae. These molecular methods have limited sensitivity and capacity to characterise pneumococcus in clinical samples, especially in specimens with a low pathogen abundance. This study applied capture-based enrichment (tNGS) to identify and characterise S. pneumoniae directly from pleural fluid samples. A total of 51 pleural fluid samples were subjected to tNGS with a custom probe panel, for 39 known positive fluids collected from IPD cases between 2018-2025 in New South Wales, Australia. tNGS results were benchmarked against molecular-based serotyping. Our tNGS achieved 100% sensitivity and specificity in detecting S. pneumoniae. Serotyping results were concordant with PCR and 95% (37/39) of S. pneumoniae PCR positive pleural fluid cases could be serotyped using tNGS. Standard molecular methods however could only determine serotype in 56% (22/39) of samples. This tNGS enabled 39% improvement in ability to directly identify and serotype IPD-associated serotypes of S. pneumoniae in difficult-to-culture pleural fluids can significantly enhance laboratory surveillance of IPD as well as our understanding of vaccine effectiveness.

14
Long-read sequencing of Mycobacterial tuberculosis is comparable to short-read sequencing for antimicrobial resistance prediction and epidemiological studies.

Colpus, M.; Baker, C. S.; Roghi, E.; Hong, H. N.; Trieu, P. P.; Thu, D. D. A.; Hall, A.; Fowler, P. W.; Walker, T. M.; Spies, R.; Webster, H.; Westhead, J.; Thai, H.; Turner, R. D.; Peto, T. E.; Quang, N. L.; Thuong, N. T. T.; Omar, S. V.; Crook, D. W.

2026-04-08 microbiology 10.64898/2026.04.08.717216 medRxiv
Top 0.1%
13.3%
Show abstract

BackgroundShort-read genetic sequencing technologies (mainly Illumina) have been extensively used for around a decade for Mycobacterium tuberculosis complex (MTBC) outbreak analysis and genomic drug susceptibility testing (gDST) with the result that Illumina has become the de facto gold standard. Long-read sequencing, as exemplified by Oxford Nanopore Technologies (ONT), offer the prospect of faster, simpler, and portable sequencing. In this work, we carry out the largest to date comparison of how well Illumina and ONT technologies sequence MTBC samples, making use of R10.4 ONT flowcells, updated basecalling models and deep-learning variant calling. MethodsA total of 508 samples were sequenced using both short and long-read platforms. All samples originated from South Africa or Vietnam and were over-selected for drug resistance and also included several local outbreaks and a range of lineages. The South African and Vietnamese samples had already been Illumina sequenced. Samples with [&ge;]50 read depth by Illumina were selected for sequencing by ONT using one of the GridION or PromethION platforms. Bioinformatics processing was done using a modified online cloud platform which included reference-based variant calling, catalogue-based gDST and identified related samples via SNP counting to inform outbreak detection. The lineages and gDST predictions obtained by short-and long-sequencing were compared for all samples as were all putative clusters identified via SNP counting. For convenience Illumina was used as the reference method. FindingsOf the 508 samples, 425 (83.7%) had sufficient read depths to permit comparison between the two sequencing technologies. The assigned lineages were identical for 407/425 (95.8%) samples and all discordances were due to mixed lineages being identified by one technology. Evidence of non-tuberculous mycobacterium (NTM) subpopulations were found in nine samples. Using Illumina as the reference method, the very major error (VME) rate of ONT for predicting resistance to all 15 drugs is 1.0% (0.6-1.5%) whilst the major error (ME) rate is 1.7% (1.3-2.2%) with an unclassified rate of 6.9% (6.3-7.5%). This is below the thresholds specified by the CLSI. Considering each of the 15 drugs individually they had VME and ME point estimates below [&le;]3% in 29/30 cases; and most 25/30 below [&le;]1.5%. Filtering out all samples containing mixtures left 382 isolates. By appropriate masking of the reference genome we were able to obtain a mean SNP distance between the two platforms of 0.13 (median of zero) for the same sample and for 376/382 samples (98.4%, CI:96.6-99.4%) the difference was [&le;]1 SNPs. The high concordance in SNP identification ensured that few differences in the 43 putative clusters among 172 isolates were observed. InterpretationThe differences between the two sequencing platforms for the key clinical outputs is so small that it is now within the tolerances set by regulatory agencies. Provided the sequencing is of sufficient quality, we have therefore reached a threshold whereby sequencing data from long-and short-read platforms can be aggregated. This will enable large scale analyses by national and international public health agencies whilst allowing the MTBC community to take advantage of the portability and speed of long-read sequencing. FundingThe NIHR Health Protection Research Unit: Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford (NIHR200915), a partnership between the UK Health Security Agency (UKHSA) and the University of Oxford, the National Institute for Health and Care Research Biomedical Research Centre: Oxford (BRC) and the Ellison Institute of Technology, Oxford Ltd. The CRyPTIC project was funded by Wellcome [214560/Z/18/Z], a Wellcome Trust/Newton Fund-MRC Collaborative Award (200205/Z/15/Z); and the Bill & Melinda Gates Foundation Trust (OPP1133541). Research in contextO_ST_ABSEvidence before this studyC_ST_ABSWe conducted a PubMed Central full text search for "tuberculosis" AND ("drug resistance prediction" OR "drug susceptibility prediction") AND ("genome" OR "genomic" OR "geno-typic") AND ("ont" OR "oxford nanopore") between 2022 and 2026 (conducted 1 April 2026). This returned 62 papers; of which, six used both Illumina and ONT sequencing. One of these, published in 2023, directly compared the performance of the two platforms on 151 M. tuberculosis isolates oversampled for resistance. The investigation yielded comparative results for the earlier generation ONT flow cell (R9{middle dot}4{middle dot}1) and base-caller (guppy version 5{middle dot}0{middle dot}16). Another, published in 2026, investigated a targeted next-generation sequencing panel of 20 amplicons using ONT sequencing on R10.4.1 flow cells with guppy 6{middle dot}4{middle dot}6. They compared the results on 71 isolates against phenotypic data and Illumina whole genome sequencing (for 53 isolates) but had low rates of resistance, with all drugs but isoniazid being limited to under five resistant isolates. Two other small studies (10 and 13 samples, respectively) conducted feasibility studies comparing ONT with Illumina, also using earlier generation flow cells and base-calling technology from ONT. Two further studies compared Illumina with ONT for direct sputum sequencing and did not investigate the comparative performance of the two platforms for variant call accuracy, resistance prediction, and outbreak detection. Illumina sequencing technology is widely used for genomic sequence analysis in research, and clinical and public health contexts. Consequently, it has become the de facto reference standard for generating whole genome sequence data. Whilst previous studies established the promise and limitations of long-read (ONT) sequencing as an alternative to short-read sequencing (mainly Illumina), the enhanced performance arising from newer flowcells (e.g. R10.4.1), V14 chemistry, and the latest basecallers (dorado v4.3.0/5.0.0) has not been analysed. Neither has any ONT analysis incorporating the new deep-learning variant callers been evaluated in a large-scale comparative study. Thus, it is currently unclear whether data generated by either platform can be used safely in aggregated analyses for research and clinical or public health service. Added value of this studyWe compared how well short-(Illumina) and long-read (ONT) sequencing platforms identify the genetic variants in M. tuberculosis, predict antituberculous drug resistance and recog-nise outbreaks. The long-reads were generated using the latest generation ONT R10.4.1 flows cells, V14 chemistry, super high accuracy basecalling (dorado v4.3.0/5.0.0) and a bioinformatics analysis pipeline built using the Clair3 deep-learning based variant caller. A total of 508 clinical samples were sequenced using both technologies, substantially more than previous studies. The sampling frame was much larger than previously investigations and included a large proportion of isolates with resistance to first-line and second-line antibiotics as well as bedaquiline. Thus, providing greater statistical power for resistance prediction than before. In particular, the inclusion of bedaquiline resistance provided evidence useful for predicting resistance to this newly deployed drug for treating multi-drug resistant (MDR) TB. We find that the differences between technologies are small meaning that either technology can be used alone safely, and services using both technologies can confidently aggregate the data for analysis. Implications of all the available evidenceThis will be a benefit to local, regional and international organisations, particularly public health agencies, which often have a mix of the two main sequencing technologies for characterising TB whole genome sequences. It also opens up the sequence based diagnostic market to greater competition, particularly if the observed performance can be replicated for other pathogen species.

15
TrIdent - An R package to automate transductomics analysis of virus-like particle mediated DNA mobilization

Maier, J.; Gin, C.; Rabasco, J.; Spencer, W.; Bass, A.; Duerkop, B. A.; Callahan, B.; Kleiner, M.

2026-04-01 genomics 10.64898/2026.03.31.715651 medRxiv
Top 0.2%
10.4%
Show abstract

BackgroundTransduction is a form of horizontal gene transfer in which bacterial DNA is packaged and transferred by virus-like particles (VLPs). Transductomics is a sequencing-based method used to detect DNA carried by VLPs. During transductomics analysis, reads from a samples ultra-purified VLPs are mapped to metagenomic contigs assembled from the same samples whole-community. The read mapping produces coverage patterns that require a time-consuming manual inspection and classification process which makes the methods use unfeasible for datasets with many samples. ResultsWe developed a novel algorithm, TrIdent (Transduction Identification), that uses pattern-matching to automate the transductomics data analysis and that is available as an R package (https://jlmaier12.github.io/TrIdent/). There is no software equivalent to TrIdent so we compared TrIdents classifications of transductomics datasets to classifications made by human classifiers. TrIdents classifications were generally comparable to the manual classifications on a previously generated, manually classified transductomics dataset. When applied to newly generated transductomics data from the murine microbiota, TrIdent agreed with two independent human classifiers as much as the two independent human classifications agreed with each other. TrIdent classified transductomics datasets in a fraction of the time needed by human classifiers, and the classifications produced by TrIdent are fully reproducible. We used TrIdent to explore three murine gut transductomes and found that bacterial DNA associated with the Oscillospiraceae and Turicibacteraceae families was highly enriched in the DNA packaged by VLPs as compared to the whole community metagenomes. ConclusionsThe TrIdent software is a more accessible, more efficient, and more reproducible alternative to the manual inspection of read coverage patterns previously required for transductomics data analysis. To demonstrate the application of TrIdent, we analyzed transductomics datasets from murine fecal pellets and showed that specific low abundance bacterial families appear to be heavily involved in transduction.

16
Global whole-genome phylogenomics of Nakaseomyces glabratus reveals admixture and refines sequence type-based classification

Adamu Bukari, A.-R.; Sidney, B.; Gerstein, A. C.

2026-04-04 evolutionary biology 10.64898/2026.04.03.716392 medRxiv
Top 0.2%
10.0%
Show abstract

Nakaseomyces glabratus is a globally distributed opportunistic fungal pathogen. An ongoing discussion in studies of N. glabratus population structure has been whether genetic clusters are best defined using multilocus sequence typing (MLST) or short-read whole-genome sequencing (WGS). To assess the concordance between MLST- and WGS-based phylogenies, we analyzed a dataset of 548 N. glabratus WGS sequences from 12 countries. Clusters identified from WGS largely recapitulated the MLST-defined sequence type (ST) groups: fourteen WGS clusters were composed of a single MLST ST, and the remaining contained STs with very closely related MLST profiles. We thus propose a pragmatic naming convention, consistent with the system used in other microbial species, which specifies WGS cluster labels based on the primary ST. From the large WGS isolate dataset, we determined the prevalence of admixture and genomic variants. Interestingly, seven of the nine singleton isolates were admixed, in addition to 58 isolates from six different clusters. Aneuploidy was detected in 4% of isolates, most commonly in chrE, which contains ERG11, the gene encoding the enzyme targeted by azole antifungals. Aneuploid chromosomes did not exhibit elevated heterozygosity relative to the sequencing error rate, consistent with instability of extra chromosome copies. Copy number variants were found in 3% of the isolates; some of the CNVs co-occurred with aneuploidies, and were primarily identified on chrD, chrE, chrI, and chrM. Our findings demonstrate that deep splits between clusters preserve the utility of MLST ST designations for clade-level designation, yet underscore the utility of WGS for high-resolution genomic analyses. Article SummaryThere is an ongoing debate in studies on Nakaseomyces glabratus about whether traditional MLST analysis is sufficient to determine population structure, or whether the precision of whole genome sequencing (WGS) is necessary. We analyzed WGS data from 548 isolates from around the world. We found a very strong agreement between the two methods. We propose a hybrid naming system, where cluster names are based on the dominant MLST group. We used the WGS data to show that admixed isolates, and those with extra chromosomes or CNVs are rare (<7% of isolates in each class) and are distributed throughout the phylogeny.

17
Retrospective analysis of clinical and environmental genotyping reveals persistence of Pseudomonas aeruginosa in the water system of a large tertiary children's hospital in England

Sheth, E.; Case, L.; Shaw, F.; Dwyer, N.; Poland, J.; Wan, Y.; Larru, B.

2026-04-24 infectious diseases 10.64898/2026.04.23.26351604 medRxiv
Top 0.2%
10.0%
Show abstract

Background Pseudomonas aeruginosa is a major cause of healthcare-associated infections in paediatric settings, where its persistence in moist environments such as hospital water and wastewater systems poses a particular risk to neonates and immunocompromised children. Aim The aim of this study was to showcase the long-term survival and transmission of P. aeruginosa in a large tertiary children's hospital in England which is crucial to develop strategies for water-safe care. Methods Environmental P. aeruginosa isolates were collected from taps, sinks, showers, and baths in augmented care areas of a 330-bed tertiary children's hospital built to NHS water-safety standards. Clinical isolates were classified as invasive (blood, cerebrospinal fluid, and bronchoalveolar lavage) or non-invasive (respiratory, urine, ear, abdominal, and rectal surveillance). Variable number tandem repeat (VNTR) profiles and metadata were extracted from PDF reports, de-identified, deduplicated, and curated using Python and R. Findings This retrospective study analysed nine-locus VNTR profiles of 457 P. aeruginosa isolates submitted to the UK Health Security Agency from a large tertiary children's hospital, identifying 56 isolate clusters (each with [&ge;]2 isolates), of which 19 (34%) contained at least one invasive isolate. The most persistent cluster (Cluster 1, n=20) spanned from July 2016 to September 2024, containing environmental and clinical (invasive and non-invasive) isolates. Conclusion These findings demonstrate long-term persistence of certain genotypes and temporal overlap between environmental and clinical isolates, highlighting the difficulty in detecting and eradicating P. aeruginosa in hospital water and wastewater systems and reinforcing the need for continuous rigorous water system controls.

18
Evaluation of a multiplexed tiling PCR scheme for whole-genome amplification of hepatitis B virus using Oxford Nanopore sequencing

Brate, J.; Grande, E. G.; Pedersen, B. N.; Frengen, T. G.; Stene-Johansen, K.

2026-03-31 molecular biology 10.64898/2026.03.28.714721 medRxiv
Top 0.2%
9.1%
Show abstract

Here we evaluated the performance of a previously published tiling PCR primer scheme by Ringlander et al. (2022) for whole-genome amplification of Hepatitis B virus (HBV) in combination with Oxford Nanopore sequencing. The primer set originally developed for Ion Torrent sequencing was adapted by removing platform-specific adapters and tested using clinical serum or plasma samples submitted for routine HBV genotyping and resistance testing. Two multiplexing strategies were compared: a single PCR pool containing all primers and a two-pool strategy with non-overlapping amplicons. Sequencing reads were processed using a Nanopore analysis pipeline, and genome coverage and amplicon performance were compared across samples spanning a wide Ct range and representing HBV genotypes A-E. Across all samples, the median genome coverage was approximately 50%, although recovery varied widely, ranging from complete failure to nearly full genomes. Combining all primers into a single PCR reaction, or separating overlapping amplicons into different reactions, had little overall impact on genome recovery, and no consistent differences between the two pooling strategies were observed. In contrast, amplification efficiency differed markedly between individual amplicons. Amplicons 1-5 generally produced higher sequencing depth, whereas amplicons 6-10 frequently showed low coverage and contributed to incomplete genome recovery. Genome coverage was strongly associated with Ct values, with higher coverage observed in samples with lower Ct values, while coverage was broadly similar across genotypes. These results demonstrate that the Ringlander et al. primer scheme can be adapted for multiplex PCR and Nanopore sequencing of HBV, but uneven amplicon performance limits consistent full-genome recovery and highlights the need for further optimization of HBV tiling PCR designs.

19
A low-cost rpoB-based multiplex MAMA PCR for differentiation of the Klebsiella pneumoniae species complex

Sharmin, M.; Amin, A.; Rahman, H.; Janecko, N.; Saha, S. K.; Hooda, Y.; Tanmoy, A. M.; Saha, S.

2026-04-15 microbiology 10.64898/2026.04.14.718422 medRxiv
Top 0.2%
9.1%
Show abstract

The Klebsiella pneumoniae species complex (KpSC) is a clinically important group of closely related pathogens associated with invasive infections. The complex comprises seven closely related members, which are often reported as K. pneumoniae, particularly in resource-limited settings. Accurate differentiation of KpSC members remains challenging because routine laboratory methods lack sufficient resolution, and approaches like mass spectrometry and whole genome sequencing (WGS) are not widely available. Consequently, the epidemiology and clinical significance of non-K. pneumoniae members of the KpSC remain underrecognized. We developed a conventional multiplex mismatch amplification mutation assay (MAMA) PCR targeting species- and subspecies-specific single-nucleotide polymorphisms in the housekeeping gene rpoB, with six primer sets for differentiation of common KpSC members. The assay was validated against 49 genomically characterized clinical isolates, after which 179 wastewater-derived isolates provisionally identified as Klebsiella spp. by standard microbiological methods were tested. Of these, 174 were assigned to specific KpSC members by the assay, while 5 produced inconclusive amplification patterns. A subset of 16 environmental isolates was selected for WGS, including four of the five inconclusive isolates. All environmental isolates with interpretable MAMA PCR patterns were concordant with WGS. The four inconclusive environmental isolates were identified as Enterobacter spp. Overall, comparison of MAMA PCR with WGS showed 100% sensitivity and 100% specificity for all tested targets, and the total cost was approximately US$1. This rpoB-based multiplex MAMA PCR provides a simple, accurate, and low-cost approach for differentiation of KpSC members in routine laboratories and may support improved identification and surveillance in resource-limited settings. ImportanceThe Klebsiella pneumoniae species complex (KpSC) has seven members but is often reported as a single organism in routine laboratories, masking clinically and epidemiologically important diversity. As a result, the contribution of non-K. pneumoniae KpSC members to human and environmental microbiology remains poorly defined, especially in low-resource settings. We developed a conventional multiplex mismatch amplification mutation assay (MAMA) PCR based on discriminatory rpoB single nucleotide polymorphisms for differentiation of common KpSC members using standard PCR and agarose gel electrophoresis. The assay demonstrated 100% sensitivity and 100% specificity against whole-genome sequencing and excluded non-Klebsiella environmental isolates initially identified as Klebsiella pneumoniae using standard microbiological procedures. With an estimated per-test cost of about US$1, this method offers an affordable and scalable option for laboratories seeking more accurate KpSC identification and improved surveillance.

20
Systematic analysis of the type VII secretion system in Streptococcus gallolyticus subsp. gallolyticus reveals genomic diversity and functional associations

Calderon, G.; Tamang, J.; Woodfin, S.; Prah, I.; Hurdle, J.; Xu, Y.

2026-04-06 microbiology 10.64898/2026.04.05.716583 medRxiv
Top 0.3%
7.1%
Show abstract

Streptococcus gallolyticus subsp. gallolyticus (Sgg) is an opportunistic pathobiont associated with bacteremia, infective endocarditis, and colorectal cancer. However, the genomic diversity of this subspecies and the distribution of key virulence determinants, particularly the type VII secretion system (T7SS), remain poorly characterized. Here, we performed genomic analyses of 76 Sgg strains from diverse geographic and host origins. Core- and pan-genome analyses, multi locus sequence typing, and phylogenetic reconstruction revealed dominant sequence types (STs) that correlate with geographic origin or source of isolation. Furthermore, systematic characterization of the T7SS locus identified five new T7SS subtypes and demonstrated a strong association between T7SS subtype and ST. We further expanded the known repertoire of T7SS LXG domain-containing polymorphic toxins (LXG toxins) in Sgg substantially through genome-wide searches. Distinct distribution patterns were observed for the LXG toxins across the strains. Lastly, our data indicated that T7SS subtype was significantly associated with biofilm formation capacity of Sgg strains. Together, these findings advance our understanding of Sgg genomic diversity, reveal substantial lineage-associated variation in T7SS architecture and effector repertoires, and suggest a previously unrecognized connection between T7SS and biofilm formation in Sgg.